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ABSTRACT 

Perspectives concern ing the validation of 
faculty-developed instruments for the assessment of student 
performance at Alverno College are .presented. Sixteen instruments 
were identified by departments for the validation studies. Three 
validation strategies were found to work best. One was a pre- and 
post- instruct ion comparison that determined if changes in student 
performance can be attributed to tfie effects of instruction. A second 
strategy was criteria evaluation, which involved the clarification, 
revision, and refinement of criteria based on an analysis of student 
performance. A third approach was the interrater reliability of 
assessor judgments, which enabled a test of reliability as well as 
the development of instrument criteria. Criteria evaluation appeared 
to be most helpful when the instrument was being evaluated and 
revised. Pre- and post-instruction comparisons were used most 
effectively after faculty had judged the instrument as meeting most 
other instrument design guidelines. Interrater reliability studies 
were most useful when they were conducted currently with criteria 
evaluation. The validation studies showed that direct involvement of 
faculty in analyzing student performance data and probing validity 
questions generated a broad scope of validity issues. (Author/SW) 
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ABSTRACT. 



The Alverno College faculty has designed a curriculum and 
assessment process to assi.st student? to develop and demonstrate 
ability in a variety of conipetences . FacuJcy, individually and 
as a group, design assessmeni inscruments which then come under 
the scrutiny of other faculty in a continuous process of review 
and redef in:i tion. Tliis evaluation and revision process stimulates 
evaluation and revision of the in^itrumLnits in a systematic way. 

Validating assessment instrumeiits is an j.ousual goal for a college 
faculty to pursue. To validate ii^eans that concepts of the 
abilities or competences assessed and the means for doing so 
must be carefully thought ouc^ subjected to rigorous reasoning, 
and constantly reviewed against student performance outcomes. 
Tl^iis report summarizes questions, suggestions, concerns and insights 
generated from feedback sessions with faculty who submitted their 
instruments for a validation study. Sixteen instruments were 
identified by departments as ready co submit because faculty judged 
them sufficiently developed to evaluate. Three validation 
strategies worked best of those tried. One is pre- and post- 
instruction comparison which detarmines if changes in student 
performance can be attributed to the effects of instruction. 
A second is criteria evaluation, which involved the clarification, 
revision and refinement of criteria based on an analysis of 
student performance. A third is establishing the inter-rater 
reliability of assessor judgments, v;hich enables a test of 
reliability as well as the development of instrument criteria. 
Criteria evaluation appears to be most helpful when the instrument 
is being evaluated and revised. Pre- and post-instruction 
comparisons are used most effectively after faculty have judged 
the instrument as meeting most other instrument design guidelines. 
Inter-rater reliability studies are most useful when they are 
conducted concurrently with criteria evaluation. The validation . 
studies that were synthesized for this report show that direct 
involvement of faculty in analyzing student performance data and 
probing validity questions generates a broad scope of validity 
issues . 
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VALIDATINC ASSESSMENT Tl . OUTCOME- CENTERED LIBERAL ART'S 

CURRICULLT1; INSIGHTS i-ROM Tin. LVALUATION AND REVISION PROCESS 



Asaessuient Coriirni t Leu/Of f i c o of Rosea rch and Eva Lu/ition 



Introduction : 

Validating assessment instruments is an unusual goal for a college to pursue. 
Historically, each college professor has developed most of his or her own instruments 
to assess student performance. An individual professor might employ various methods 
to improve testing instruments— but seldom if ever would he or she submit them to 
others for systematic and continuous review, Nor are systematic attempts often made 
to compare a student's performance across a number of courses or instruments, or to 
predict future professional success from measures of student performance in college 
courses. 

The Alverno College faculty has set itself the task of assisting students to develop 
and demonstrate ability in a variety of competence areas (e.g. Communications, 
Analysis, Problem Solving, Valuing in Decision-Making, Social lateractio^i, etc. 
that faculty as a group have chosen through consensus as important to individual 
growth and professional performance. They have implemented an assessment system 
where faculty, individually and in groups, design assessment instruments which then 
come under the scrutiny of others in a continuous process of review and redefinition. 
Quality assurance procecftires stimulate evaluation and revision of instruments m a 
systematic way. 



on assessment, as v/ell as on 
mestions about the validi ry 



Because faculty extended themselves to 
specifying goals, they are necessari] 
of their instruments or techniques. 

What do faculty mean by validating the techniques of assessment? In this liberal 
arts colle-e, "to validate" means that concepts of the abilities assessed and the 
means for doing so must be carefully tbousht out, subjected to rigorous reasoning, 
and constantly reviewed. Among the immediate responses to this commitment has been 
reliance on the objective judgment available from a variety of sources--f acuity 
judgment across disciplines and competences, judgments from professionals outside 
the institution who also serve as assessors of student performance, and special 
interdisciplinary committees like the Assessment Committee set up to generate 
objective judgments about the individual assessment techniques and to monitor 
instrument^valunt iou procedures. Faculty have committed themselves to go beyond 
content validity and evaluation and revision of instruments, to questions ot 
validity because their questions encourage it. 

Faculty Condu cted Validation Studies: Some Insight s 

The following report summarizes questions, suggestions, concerns and insights 
generated from feedback sessions with faculty who submitted their instruments for 
a validation study. Sixteen instruments were identified by departments 
"model" instrument - ''1 ' - Judged t' m sufficiently developed to validate. 



Several validation ;.Liate^, 



iS were employed by faculty; 



^'lus nap-r was r .-pr. '^Im-mI in V.nhrri V, \.nrr^^y.n (Kd.) -rnrr,'<\ii^^ of. t!u' 
Eighth jjlt^na Li muil Coji.,n.ss^ on lUv Assc^sisnu;.. t. CyuWi. ^Vtlu.l; Toronto, 
Ontario, June A-6, 19^0 ^ 
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Pre- a nd Post-Ins tr uc ci-oa ^PJMgj;AiL<^Jl 

This procedure provides Information on the extienl: to which the instructional 
process produces changes in students' performance and the extent to which the 
instrument under study is effect Lve in measuring such ctianges. 



Cjritieria Evaluation 

Clarification, revision and refinement of criteria based on an analysis of student 
performance brings us closer to the intended meaning of the behavxors and abilities 
measured, thereby creating a more vaj.id assessment technique. 



In our work with "model" instruments that assess generic, developmental and holistic 
competences (Alvorno College Faculty, Assessment at Alv erno College , 1970) we are 
often Inferring an unobservable "construct" or ability from observed behavior. It 
is essential to continue to develop our understanding of the nature of the 
competence or r,bility we teach toward "construct validity" by integrating evidence 
from different sources of expert judgment. Establishing inter-rater reliability 
of judgments by two or more assessors remains one of the better ways to establish 
instrument validity. Comparing our professional judgments stimulates development 
of mutual standards as a base for defining instrument criteria. 

The model instrument validation studies demonsl.rate that direct involvement of 
faculty in analyzing student performance data and probing validity questions 
generates a broad scope of validity issues. 

An in-^ortant outcoir.. ;ome feedbaci. sions wit: acuity was the recognition 

that many kinds of "validations" will result in "qualitative" rather than a quanti- 
tative analysis. For example, one faculty member, after comparing each student on a 
general pre- and post-assessment based on classroom observations, was able to 
identify the number of students who had gained more of the objectives, some of the 
objectives, and few if any of the objectives. Still another criteria evaluation 
was completed by one member of the Assessment Committee who simply counted the 
number of students who completed each of the objectives, based on data collected 
by the instructor during the semester—- data the instructor used to record information 
for individual feedback and competence validations. 

Still another important outcome of the feedback sessions, and the information 
relating to validating the model instruments that we have collected* so far, is that 
most criteria evaluations will not involve collecting more data than we already 
collect in our role as a course "assessor." In general, we learned that not all 
three validation strategies need to be employed concurrently. Criteria evaluation 
appears to be most helpful when the instrument is being evaluated and revised. 
Pre- and Post-Instruction comparisons arc most helpful when faculty have judged 
the instrument as generally satisfactory. Inter-rater reliability studies are 
probably most useful when they are used concurrently with criteria evaluation. 

Insights from Pre- and Post-Instruction Comparisons 

In conducting a Pre- or Post-Instruction study establishing the reliability and the 
validity of an instrument and the assess'nent outcomes, we must consider the 
composition of the student group involved. One group may be a homogeneous group 
in that students have similar areas of concentration, developmental stage, motivation, 

ERIC 7 



3 



or purpose for oursuinv, the course. Another group may be more heterogenous with 
respect to these factors. Students may be from a variety of majors, their year 
in school mav bo different, some students may be taking the course primarily to 
meet certa in val idat ion requirements, etc. Such homogeneity or diversity may 
■" very well be reflected in the kinds of learning experiences they choose after the 
pre-assessment, the nature of the post-assessment, and the expectations for 
validation. These variations may need to be considered in the overall mterpreta 
tion of the student performance data. 

Another factor to be considered is the degree to which students are motivated to 
pe-form. It i.s important to create a comparable motivational effort m^^both the 
pre- and post-assessment. A powerful motivation for pre-assessment is testing out 
of a competence Uwel . If students are told to regard the pre-assessment adminis- 
tration as onlv a source of information to the instrucuor, and there is no tangible 
benefit to them personally, lack of motivation alone may account for differences 
between the pre-and post-assessment. 

We also found a need to look at the relationship between the pre- and post- 
asse<^sment. Are they comparable with respect to instrument stimulus, to criteria, 
to mode of assessment? In some disciplines, where competence is inseparable from 
coPtf-nt, it may be impossible to administer the same stimulus. Students are not 
yet familiar enough with the complexity of the content during a " --as ssmenL. 
In that case, instructors may decide to administ tin it wi , 

student performance be comnnr.-' ^ ' ou,- ed by the same set of 

. , TKi^. , -at ' student performance may vary 

criteria.' Ihis , > i - 

with the stimulus ployed. ^/ 

Another insight that emerged cautioned us to examine the "route of progress^^ as 
well as the "rate of progress" in comparing performance from a pre- to post 
assessment. The instrument is powerful diagnostically if progress can be qualita 
tively evaluated rather than providing a statement of all or none progress alone. 

Insi ghts from Criteri a Evaluations 

Several questions have emerged as we have discussed criteria evaluations with faculty. 

First we have found that it is important to ask whether students have enough 
opportunity to demonstrate the called-for behaviors. For example, an assessment 
technique which asks students to demonstrate a number of competences may provide 
less oprortunity for the studon; to be e xplicit in responding to one competence. 

•v'cond e found that lack of mastery of one competence may impact student 
performance in a related competence. For example, students who did not master 
levels 1 and 2 in Analysis may not reach an accepted level of performance in 
Communications, Listening and Reading, levels 1, 2, 3 and 4. 

Another question we must ask is: Will instruments tend to be more "valid" for 
assessing the effects of instruction if criteria are presented to ^he student 
explicitly? How does this consideration affect criteria from levels 5 and 6, which 
are deliberately more implicit? 

Finally, what is the relationship between criteria and content? .Are there -ntent 
• areas w^ich are more readily integrated with a specific competence? How do criteria 
change along with the content of the discipline and situational variations m 
administering the instrument? 

ERIC S 
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Insights from Studies of Inter-ratcr Reliability of Assessor Judgment s 

It seems important to further investigate the inter-rater reliability of an 
instrument by giving instruments to practicing professionals off-campus in order 
to learn how they interpret student performance, as compared to the educational 
expert on-campus. Such a procedure may provide an additional measure of tlie 
external validity of an instrument. 

Some faculty members were particularly interested in re- judging student performance 
already judged by anotlier instructor (study of the inter-rater reliability of 
assessor judgments) because the faculty member wanted to know how close he/she 
would come to understanding the criteria in the same way. This faculty member was 
interested in stimulating additional discussion about tlie criteria currently 
under study in the department. 

In the absei ce of judgments from two assessors, is it possible to make a prediction 
as to the cOi^sistency with which student performance on the instrument might be 
judged by another faculty ■ >ber from the same discipline? Some individually 

rlf-qi^^T'od ' istruments u:. i .hi\. i specific course as a formative assessment 

speci^i^' that it luay be difficult to find another assessor to make a 
'.iU j udgmen 

Some faculty wh conducted a study of the reliability of assessor judgments 
began thinking about level 5 and 6 validations and the role of criteria in 
eliciting student performance. They asked: Wliat is the relationship of assessors' 
judgments to explicit vs. implicit criteria? Will judgments be consistent with 
the same set of criteria when one is defined in an open-ended way and the other 
is more directive? For example, can one assessor's judgment be expected to be 
similar to the judgment of another assessor if the criteria are defined explicitly 
vs. Implicitly? Supposedly, if the Ci-iteria are directive, they may elicit 
performance that is different from criteria that are implicit In the Instrument 
directions to students. 

In conclusion, this first group of faculty-conducted validation studies provided 
important insights for future work. At each step of the way, we have come to 
recog! /.e t ie importance of group effort in pursuing validation Issues. Insights 
from one department assist another. More important, we are finding instrument 
validation easier in some ways than we originally thought. Faculty have a great 
deal of experience individually in instrument "validation" — even though they may 
■call it something else in their own mind. Sharing ideas in these feedback 
sessions with faculty has clai'ified our thinking and has supported our efforts 
to continue to pursue validation issues with our non-traditional assessment 
techniques. 
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CRITERIA FOR ASSESSMENT INSTRUMENTS 



1. Does the instrument elicit and measure the complex abilities designated 
for competence level(s) within a defined context? 

2. Does the instrument elicit the fullest expression of student ability at 
that level in that context? 

2. Does the instrument require the use of substantive content commensurate 
with the level of sophistication of the ability? 

4. Does the instrument integrate previous levels of the competence and 
require the student to demonstrate an increasingly sophisticated ability 
of lower levels; at higher levels? 

5. Does the instrument elicit a range of perfornance? 

6. On a scale fron discrete to fully integrated, does the instrument reflect 
the appropriate level of integration of dimensions of performance 
(content with competence; among competences)? 

7. D s the instrument involve a production task rather than a recognition task? 

8. Does the instrument use assessment mode that recognizes the intrinsic 
nature of the ability being assessed? 

9. Does the instrument allow for the judgment of performance against public 
and explicit criteria? 

10. Does the instrument assess the student's ability to self-assess? 

11. Does the instriament allow for assessment of the student's performance 
external to the learning situation? 

12. Does the instrument elicit performance with sufficient data to provide 
for diagnostic, structured feedback to the student on her strengths and 
weaknesses? 



13. Do the instrument criteria provide evidence for credent ialing performance? 
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